How does the proportion of people aged 65+ affect Covid-19 mortality across continents?

Team 5
December 8, 2020

No aspect of our lives has been spared by the impact of the coronavirus pandemic and—to some extent—the virus has done damage that needs time to be repaired. The virus has challenged healthcare systems across the world in a way unprecedented in modern times. Older people are disproportionately affected by the COVID-19 pandemic, which has had a profound impact on research as well as clinical service delivery. They are bearing the consequences of the pandemic as a group at the highest risk of hospitalisation and death from the COVID-19 illness. In this research paper, we’ll take a closer look at the share of population aged 65 or older and population densities of specific countries across continents where data is currently gathered on a daily basis.

As proven by studies, the risk for severe illness with COVID-19 increases with age, with older adults at highest risk.(Source: Centers for Disease Control and Prevention) Although all age groups are at risk of contracting COVID-19, older people face a significant risk of developing severe complications if they contract the disease due to physiological changes that come with ageing and potential underlying health conditions. For example, people in their 50s are at a higher risk for severe complications than people in their 40s. Similarly, people in their 60s or 70s are, in general, at higher risk for severe complications than people in their 50s. The greatest risk for severe complications caused by COVID- 19 is among those aged 85 or older.

Our research question is: How does the proportion of people aged 65+ affect Covid-19 mortality across continents? We are focusing on people 65 and older as the focal point of our investigation.

Is the older population more likely to be influenced in greater magnitude by the virus? How is the older population divided in continents around the world? What are countries doing to help the older population?

The research done and data provided by WZB, Institutions and Political Inequality Group stands out due to its detailed annotation. However, even more commendable is its complete transparency with the data collected and the way it has been manipulated. This empowers readers to, firstly, gain a deeper understanding of the results and the workings of statistical models, and, secondly, contribute with their own analysis. The process is a lengthy one and filled with trial and error. Firstly, one must go through all the variables and understand what they entail. This resulted in 132 columns.

Then the team ran through the data set to see its structure. Going through the table produced, the team decided how to equally distribute the workload. It made sense to distribute geographically. We divided the world into three vertical strips:

COVID-19 mortality is the number of deaths out of the total number of infected cases. Studies have shown that COVID-19 mortality can be explained by age, obesity, and underlying diseases, such as hypertension, diabetes, and coronary heart disease, as well as clinical symptoms, complications, hospital care, previous immunity and virus mutations. (Source: Liang, Li-Lin, et al.) Countries vary widely in terms of capacities to prevent, detect and respond to disease outbreaks.

In this research paper we are going to use these variables from the data set:

*old_perc is a variable created in addition to the already existing dataset. It is needed to divide the population of older people into percentages across the countries of the world for better representation, clearer point of view and overall better understanding and analysis of the data.

This is how we analyzed the data given to us, how we observed it and what results the data yielded.


North America and Latin America and Caribbean by Aleksandra

Firstly, one must go through all the variables and understand what they entail. This resulted in 132 columns.

I chose to analyze the first part: North America, Central America and South America. To do this, the data set had to be filtered for these three values and saved into a new variable. After skimming through the data, I realized that countries belonging to all three parts are saved under the same category for continent = America. The data set was filtered for countries, where the continent value is set equal to America and then saved into a new variable.

[1] 48

The new data set has 48 countries saved into rows and the same 132 variables saved into columns. For initial glimpse of the data at hand, I plotted a simple point graph, which visualizes the relationship between cumulative cases and cumulative deaths in the Americas to see if there seems to be a pattern.

Immediately after plotting this, it is obvious there are two extreme outliers. This skews the plot, not allowing for careful examination of the majority of values. These two values must be omitted. To do so a new value is created, where only variables with cumulative cases below 2,500,000 are saved. Another variable is created that saves the two outliers to see who these two countries are. Unsurprisingly, the country with the most cumulative Covid-19 cases is the United States with 12,089,438 cumulative cases (at the time, when the data was collected). The second country is Brazil with 6,052,786 cumulative cases (at the time, when the data was collected).

Now it is possible to delve into a more detailed analysis of the remaining 46 countries spanning the Americas. However, the process was not as smooth as one might think. When brainstorming, which variables might be of interest for the team’s analysis, we decided to analyze lockdown measures as this is something that acutely affects all of us and frequents media headlines. After settling on the lockdown variables, I started running through the data for the Americas. I quickly realized that nearly all values for the lockdown variables were missing and marked N/A. The data set had been created too early on in the pandemic to incorporate data reflecting lockdown measures and their effectiveness.

This idea was quickly scrapped and we had to move on to measuring different variables. This time we immediately made sure that the variables we are interested in had sufficient amounts of valid values. As a team, we decided that we want to focus on variables measuring the specificities in populations across the world as this might offer interesting insights about human diversity across the globe. Thus, we settled on measuring how the proportion of people aged 65+ affects Covid-19 mortality across the globe.

To do this, I started with the initial step of measuring Covid-19 mortality. In this paper, Covid-19 mortality is measured as the proportion of cumulative deaths when measured against cumulative cases per million. Of course one must keep in mind the limitations, when measuring and defining these two variables as different governments around the world report data based on different definitions.

After plotting the two variables against each other, a clearer pattern emerges. If mortality is measured as the slope of the graph, where the x value is the number of cumulative cases and the y value is the number of cumulative deaths, there seems to somewhat of a linear relationship. However, there are definitely too few countries with high cumulative death numbers to trust this pattern. Therefore, a similar process has to be repeated, where the outlier has to be filtered out (after creating a separate variable, we find the outlying country with cumulative deaths greater than 7,500,000 is Mexico). Now plotting the new variable that does not entail Mexico, there is a lot stronger linear relationship. We learn that, in the Americas, cumulative deaths are proportionally related to cumulative cases. From here further research discussing the implications and related policies from governments could be done.

Now after defining and analyzing mortality and its relationship to cumulative cases and cumulative deaths, we can move on to the variables analyzed in this paper. For this, three variables are added to the dataset. Firstly, we must create a new variable that combines both the female population that is 65+ and the male population that is 65+. Then, we must look at the share of the population that is 65+ against the total population of a country. This is saved into a new variable too. Here we also define mortality as cumulative deaths divided by cumulative cases per million.

Now we can finally plot Covid-19 mortality against the share of population that is 65 or older. While the uncertainty is quite high, there seems to be a negligible relationship between the two variables. This is quite surprising as our initial hypothesis stated that countries that have a larger proportion of citizens aged 65 or higher would see higher Covid-19 mortality rates. The data suggests that as the share of population aged 65 or older increases, Covid-19 mortality doesn’t change. This might be due to the small dataset or due to powerful confounding variables.

To further explore the dataset, I went back to the beginning to see further variables available for each country. The next step was to filter the new dataset based on the region. This is where I ran into a disadvantage of the dataset. It classifies the countries based on only two regions:

There are two details that seemed quite shocking. Greenland is classified as “Europe & Central Asia” as a region, but “America” as a continent. After careful consideration, I decided to leave the data point in my dataset since it would have little effect due to its small population. Second, I was surprised that the entire continent of South America fell under Latin America & Caribbean. After omitting the United States as an outlier because of its extreme number of cumulative cases, I ended up with only two countries in the North America category (Canada and Bermuda (again, quite surprising)), one country in the Europe & Central Asia category and the rest of the countries fell under the Latin America & Caribbean category. Thus, further analysis based on region is inconclusive.

It makes sense to further inspect mortality rates, when dividing the dataset in three tiers:

We see that most countries in the Americas fit within the first tier, which again proves that most of the countries in the Americas have young populations. Tier 2 and tier 3 have too few data points to draw a conclusion and tier 1 seems to show no significant relationship either. This could have multiple explanations, but most likely is affected by confounding variables such as income levels. Countries with larger population shares of people aged 65+ are usually wealthier countries with better healthcare systems.

A problem with the point plots is that many countries have similar values, thus the points lie on top of each other, masking the magnitude of the information in the data available. This is where density plots make a great choice. Therefore, the same relationships are now revisited through density plots in hope to discover deeper insights. Now the overall distribution of values across the continents can be seen. Since North America has only two values, it is not visible on the graph, however, this shortcoming has already been discussed. Therefore, the bulk of the analysis is about Central and South America. The graph clearly depicts that Latin American & Caribbean countries are relatively ‘young’ countries with only a small share of population aged 65+. The vast majority of countries are in the 5-10% range. In fact, only Puerto Rico has more than 20% of its population aged 65+. This prompts the question whether in the context of Latin America & Caribbean, there are other variables that play a stronger role in the mortality of Covid-19 than the age of the population.

When inspecting the mortality density graph, it is surprising to see that countries classified as North America, which are often considered to be wealthier, with better health care systems, have higher mortality rates. Again, one must be careful to look for simple explanations as the dataset consists of only two countries. In the case of North America, a study inspecting Covid-19 mortality rates state by state would prove to be more conclusive.

Inspecting Latin America & Caribbean shows that the vast majority of countries have mortality rates below 0.35 / Million. However, Bolivia and Ecuador are extreme outliers that have mortality rates higher than 0.65 / Million.

However, superimposing the mortality plot on top of the plot representing the share of population aged 65+ proves inconclusive. There seems to be little to no relationship between the share of the population aged 65+ and its respective Covid-19 mortality rates. This was also observed in the previous point plot, where the share of the population aged 65+ was plotted against mortality rates and the plot resulted in an almost horizontal plot with the slope of around 0.

Although this is not the main focus of the research paper, we wanted to explore a possible confounding variable, since we could not find a strong relationship between our chosen variables. Reports of Covid-19 case explosions in migrant communities beg the question whether there is a correlation between mortality rates and the migrant population as a share of the total population. When it comes to the Americas, the most affected are Venezuelan migrants. “As of 30 October 2020, more than 136,000 Venezuelan migrants and refugees had returned to Venezuela from other countries in the region (IOM and UN OCHA, 2020). At its peak, 600 Venezuelans returned from Colombia daily and an average of 88 Venezuelans returned from Brazil daily via the border at Pacaraima (Coordination Platform for Refugees and Migrants from Venezuela, 2020).” (Migration Data Portal)

The hypothesis would suggest that countries with a higher share of their population consisting of migrants would have respectively higher Covid-19 mortality rates. Creating graphs of both Mortality per million vs Population density and Mortality per million vs Migrant share of the population shows to be some of the strongest correlations revealed in this research paper, when it comes to the Americas. Often migrant population live in denser areas, which are already more prone to a deadlier spread of infections. Considerable proof to this statement is visible in both graphs. This could be explored in further research papers.

To further discuss the shortcomings of the dataset and analysis itself, it is useful to look at the statistical values of the dataset at hand. These tools show us once again that most countries in the Americas have a relatively low number of cumulative cases. However, the large difference between the mean and median values mirrors that the dataset has a few extreme outliers.

[1] 10687.47
[1] 6620.834
[1] 11167.48

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  201.7  1346.6  6620.8 10687.5 13965.6 44436.1 

However, when looking at mortality, the outliers are not as obvious as with cumulative cases. Some interesting further research exploring this relationship could also be done. The difference between the median and the mean is a lot smaller.

[1] 0.08050732
[1] 0.004196266
[1] NA

     Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
0.0000000 0.0009904 0.0041963 0.0805073 0.0555201 0.7307602 

When looking at the share of migrants, we again see a large difference between the mean and the median values, suggesting there are a few extreme outliers. The histogram shows us that most countries in the Americas have migrants as a share of the population of 10% or less.

[1] 12.89577
[1] 4.805
[1] 17.08245

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
  0.117   1.669   4.805  12.896  15.137  70.448       3 

EUROPE and AFRICA by Stefan

[1] 109

General background

On 20 Nov. 2020, Europe surpassed 15 million cases, and cases have abrubtly been rising since then. On 19 ov. 2020, Africa surpassed 2 million cases, and cases have steadily been rising since then. Out of all the continents in the world, Europe has the biggest amount of cases to date. It all started in Italy and spread throughout Europe. The continent has over 400 000 deaths. On the other hand, Africa has the second to last place in COVID-19 cases, just after Oceania. The continent has over 50 000 deaths.

Countries vary widely in terms of capacities to prevent, detect and respond to disease outbreaks. In this paper, we aim to explore these factors associated with COVID-19 mortalities at the country level, specifically in the region of Europe and Africa.

We are aware that wealthier countries tend to have more older people because a good economy allocates more resources in the healthcare of the country, which in turn provides good resources for the older population whenever they need them. This is especially important for Europe, as the continent tends to maintain the number of older people in the European countries. With that in mind, we decided to explore how the older population contributes to the acceleration or impedement of the mortality rate.

Since the variable age does not exist in the data set, we will utilize other variables to test our hypothesis.

VARIABLES we will use from the dataset:

To begin with the analysis, we’ll first plot a basic scatter plot (cumulative cases vs cumulative deaths) showing the initial glimpse making use of the two variables i.e. cases_cum and deaths_cum.

Since there are four outliers, so removing them will enable a more closer and accurate analysis.This skews the plot, not allowing for careful examination of the majority of values. These four values must be omitted. The logarithmic scale supported the creation of this graph as it allowed for a good and legible upward slope of the cases vs deaths line.

In order to see the effect of density, as well as to satisfy our curiosity about the effect of the disease on migrants in both Europe and Africa, we am creating new variables without outliers that would prove efficient in further analysis.

Now, we can plot the cumulative cases vs cumulative deaths again with the new dataset (where the four outliers are removed).

This scatter plot has a curve going in an upward direction. Hence, it has a positive correlation between the two variables i.e. cases_cum and deaths_cum. This means that as the number of positive COVID-19 cases rise, the death cases also rise, but in a fast manner. With big increase in cases, there is smaller increase in the total death count.

Now, to get a a better view of how the cumulative cases of the virus are connected with the mortality of COVID-19 cases, we have to create a graph with those variables.

The plot indicates that there is nearly no correlation, but a slightest negative correlation between COVID-19 mortality and cumulative cases overall (including all countries in Europe and Africa).

The residents living in areas with high population density, such as big or metropolitan cities have a higher probability to come into close contact with others and consequently any contagious disease is expected to spread rapidly in dense areas. Now, we’ll analyze and conclude what kind of relationship exists between these.

To analyze how the variables we have chosen react with each other, firstly, we have to create and transform a variable called pop_older so it includes men and women older than 65. Then, we divide pop_older with the population in 2019 to get a variable called older_share. Lastly, we create a variable called mortality to show how the cumulative cases and deaths are connected to one another.

Now, we’ll divide the proportions of old aged population into - low % old population - moderate % old population - high % old population in order to see what kind of relationships exist between the two variables.

Filtering countries that have old aged population % below 4.718 (LOW %).

Europe does not have low percentage of old aged population.

Filtering countries that have old aged population % between 4.718 and 7.652 (MODERATE %).

Filtering countries that have old aged population % greater than 8 (HIGH %).

According to the plots, the negative correlation was significant for countries with moderate and low percentage of old population, respectively. The overall negative correlation in all three plots might be due to the fact that countries, with time adopted several preventive measures, including better healthcare facilities, professionals to lessen the impact of the virus on its people.

The survival rates seems to have improved, but rising case numbers are causing the total number of deaths increase. The average age of people who developed COVID-19 and those visiting emergency rooms due to the disease dropped as more young people came down with the illness. Thus, there was an increase in younger people hospitalized with COVID-19.

Moreover, Many people at risk are also taking more steps to reduce the chances of being exposed to the virus. People who are older and have more underlying medical conditions are more consistently doing social distancing, frequent handwashing, and other measures to protect themselves.

As we compare the above three plots, we see that the countries that had comparatively lower population of old aged, showed a steeper negative slope (or comparatively more negative correlation than other two). It indicates that due to old population already being low, there was more decrease in mortality rate with rising cases.

While the countries that have moderate percentage of old population, showed not much decrease in mortality rate or less steeper negative slope than the previous. This might be because there is more old population proportion in these countries comparatively.

Based on surveying the countries in Europe and Africa, the countries that have high percentage of old people, are advanced in terms of technology, government and have better healthcare system. Although the countries are advanced, there is a high correlation in % of old population and COVID-19 mortality scatter plot.

Now, in order to answer the question that this research paper poses - how does the population of older people (65 and above) as well as the share of migrants in Europe and Africa separately affect the mortality rate, we have to create new variables. We are also giving attention to the confounding variable ‘density’ and see how it is affected by the number of older people and migrants. For this, three variables are added to the dataset. Firstly, we must create a new variable that combines both the female population that is 65+ and the male population that is 65+. Then, we must look at the share of the population that is 65+ against the total population of a country. This is saved into a new variable too. Here we also define mortality as cumulative deaths divided by cumulative cases.

We are creating variables for Europe and Africa, together and separately.

Now that we have created the variable called “mortality” for the whole dataset including the outliers of Europe and Africa we can create a graph that represents the cumulative cases by million to make it comparable to the mortality rate.

The graph shows a peculiar result. It is noticeable that there is one outlier with around 80,000 cases per million people and that country has a mortality rate lower than 2%. Although other countries have a way smaller number of cases per million people than this particular outlier, they themselves are outliers as they have between 6% and 8% mortality rate per million people. The situation regardining mortality rate in both Europe and Africa is not alarming as it is a slope with a downward inclination.

We’re now making a graph between the population of older people and the mortality rate. This is important because we are putting the newly created variables in context, but also these variables give us a clear perspective of how the population of older people affects the mortality rate in the two continents.

We can see that the mortality rate of countries that have 0 - 1 000 000 people ages 65 and older tends to be around 0.5 to 2.5%. There is an outlier, i.e. a country that has over 2 million older people, but their rate of mortality is close to 2%. If we look at the statistic, Italy has the largest amount of older people out of both Europe and Africa, with a staggering 23% (Eurostat) of the population.

Next, we are creating a density plot that portrays where the share of the older population tends to be in Europe.

There is strong correlation in the share of older people (15 to 20 percent of older people) and the density which is about 0.06 to 0.08. This means that in most countries in Europe there are 15 to 20 percent of older people. This statement is supported by our previous analysis.

Then, we are trying to understand the correlation between migrants and mortality in Europe and Africa. We have tried facet wrapping the results by region,

Migrants in Europe and mortality rate

Because Europe has a significantly big amount of migrant population - 21.8 million people (Eurostat), we were curious to find out how migrants affect the overall mortality rate.

Firstly, we are trying to see the correlation (if there is any) by using a scatter plot.

This graph shows that in around 20 percent of migrants or less, there is a mortality rate of around 2 percent or less, which is not a big number.

Now, we are creating a density plot that portrays where the share of the older population tends to be in Africa, just like we did in Europe.

There is strong correlation in the share of older people 0 to 3 percent of older people) and the density which is about 0.06 to 0.08. This means that in most countries in Africa there are 0 to 3 percent of older people who collectively make population density of about 0.7, which is strong.

Migrants in Africa and mortality rate

Africa is the hosts the 4th largest number of global international migrants (Wikipedia). Thus, we were curious to find out how migrants affect the overall mortality rate.

Firstly, we are trying to see the correlation (if there is any) by using a scatter plot.

This graph shows that in around 5 percent of migrants or less, there is a mortality rate of around 2 percent or more, which is not a big number.

Descriptive Analysis of Europe and Africa

The descriptive statistics allow us the see the median, mean, minimum and maximum of the variables mortality, old_perc, cases_cum and deaths_cum. These statistics offer insight into a combination of values for those variables specifically for Europe and Africa. This way, we can how severe the effect of the percentage of old people on the overall mortality rate was on these two continents, but also whether COVID-19 struck hard in Europe and Africa.

# A tibble: 1 x 16
  mean_old_perc median_old_perc minimum_old_perc maximum_old_perc
          <dbl>           <dbl>            <dbl>            <dbl>
1          9.98            6.45             1.96             23.0
# … with 12 more variables: mean_mortality <dbl>,
#   median_mortality <dbl>, minimum_mortality <dbl>,
#   maximum_mortality <dbl>, mean_cases_cum <dbl>,
#   median_cases_cum <dbl>, minimum_cases_cum <dbl>,
#   maximum_cases_cum <dbl>, mean_deaths_cum <dbl>,
#   median_deaths_cum <dbl>, minimum_deaths_cum <dbl>,
#   maximum_deaths_cum <dbl>

The mean of the percentage of old people is 9.97, which is a good number of older people spread proportionately. The lowest percentage is Uganda with 1.96 and the highest percentage is Italy with 23.01 percent of older people.

The mean mortality is 1.96% which is an extremely good number considering the health risks of old people facing the coronavirus. The maximum mortality is 6.5% which is also exemplary to see.

The mean cumulative cases are 214474. It is a solid number of cases considering the high number of 15+ mil. cases in Europe and 2+ mil. cases in Africa.

Descriptive Statistics for Europe

These statistics show the specifics regarding mortality, cumulative cases and cumulative deaths in Europe with outliers, by using the variable covid_europe.

# A tibble: 1 x 12
  mean_mortality median_mortality minimum_mortali… maximum_mortali…
           <dbl>            <dbl>            <dbl>            <dbl>
1           1.79             1.51                0             6.79
# … with 8 more variables: mean_cases_cum <dbl>,
#   median_cases_cum <dbl>, minimum_cases_cum <dbl>,
#   maximum_cases_cum <dbl>, mean_deaths_cum <dbl>,
#   median_deaths_cum <dbl>, minimum_deaths_cum <dbl>,
#   maximum_deaths_cum <dbl>

The above table summarizes the mortality rate, cumulative cases, deaths of different countries in Europe.

Out of all the countries in Europe, the mean of cases is 272105, the median is 84603 and the maximum number of cases from the cumulative statistic is 2127051. The mean of deaths is 6036, the median is 11098 and the maximum amount of deaths in a country (as far as the data set tells us) is 54626. Those are very good numbers that show Europe is handling the pandemic efficiently, albeit the outliers.

The mean mortality rate in Europe is 1.79%.

The standard deviation, the histogram and a plot offer more detailed insight on these numbers. That is why we are using them, but with the variable new_covid_europe with removed outliers.

[1] 31909.55
[1] 21216
[1] 33677.97

These numbers show that there is a low number of cases from the dataset with removed outliers which shows that Europe can handle the pandemic.

Descriptive Statistics for Africa

These statistics show the specifics regarding mortality, cumulative cases and cumulative deaths in Africa with outliers, by using the variable covid_africa.

# A tibble: 1 x 12
  mean_mortality median_mortality minimum_mortali… maximum_mortali…
           <dbl>            <dbl>            <dbl>            <dbl>
1           2.12             1.79                0             7.63
# … with 8 more variables: mean_cases_cum <dbl>,
#   median_cases_cum <dbl>, minimum_cases_cum <dbl>,
#   maximum_cases_cum <dbl>, mean_deaths_cum <dbl>,
#   median_deaths_cum <dbl>, minimum_deaths_cum <dbl>,
#   maximum_deaths_cum <dbl>

Out of all the countries in Europe, the mean of cases is 37416, the median is 6205 and the maximum number of cases from the cumulative statistic is 765409. The mean of deaths is 898, the median is 108 and the maximum amount of deaths in a country (as far as the data set tells us) is 20485.

The mean mortality rate in Africa is 2.11%.

The standard deviation, the histogram and a plot offer more detailed insight on these numbers. That is why we are using them, but with the variable new_covid_europe with removed outliers.

[1] 131.4884
[1] 75
[1] 190.8662

These numbers show that there is a low number of cases from the dataset with removed outliers which shows that Europe can handle the pandemic.

Analysis of the effect of Population Density on cases and deaths in Europe and Africa

Population density refers to the number of people living in an area per square kilometer.

Only the right scatter plots shows a strong positive correlation between the two variables i.e. deaths_cum vs pop_density. The lesser slope or inclination in the first scatter plot might be due to the fact that there are many countries in Europe do not have a very high density of people living in one square kilometer.

In the first case, the plot indicates that as the country’s population density increases, there is slight fall in number of infected cases.

The reason behind it might be because COVID-19 is highly contagious and there is higher risk of spread in areas where more people live per unit area. Like in Europe, where many people live in close proximity to one another. Take for example Malta, or Netherlands, the former having 1380 people per square kilometer, the latter having 488 people per squarke kilometer. In Africa, there are also countries with a high number of people living in close proximity to one another. For example, Mauritius with 623 people per square kilomter or Rwanda with 499 people per square kilometer. These statistics explain the upward slope. In the second case, the same reason comes into play and countries that have more population living per unit area, have more infected cases, and as a result more deaths. Thus, giving a positive correlation.

Result - Population density does have a positive effect on increasing the number of cases with a steep line, while it also does have a positive effect on deaths of COVID-19 virus in Europe and Africa, but with a more upward slope.

Drawbacks of this analysis: The dataset used for this analysis confuses the countries that belong to two or three continents at the same time. For example, Malta is placed both in Europe and in North Africa; Azerbaijan, Armenia, and Georgia are placed both in Europe and Asia. That is the reason why some results may be unexpected and a lot of deleting had to be done. This type of classification of countries into continents creates confusion when we were trying to focus specifically on Europe and on Africa as two separate continents. There needs to be an additional method of classification that would put one country as belonging to one continent. A way that that method can be utilized is: classifying the countries by area - how much of the country belongs more to one continent or another.


ASIA-PACIFIC REGION by Ronit

I’ll be analyzing the third part: Asia, Oceania and Australia. First, I’ll be filtering the data set to include only the countries that are present in this specific region, which will be stored in a new variable called ‘covid_asia’. I noticed that Australia was already included in Oceania region in the dataset. Therefore, I filtered only for Asia and Oceania. By using the ‘nrow’ function, I got to know the total number of rows i.e. countries in this region, which is equal to 52 countries.

[1] 52

General situation or background of this region

Asia-Pacific is the hardest hit by COVID-19 among all the other continents. About 80 percent of the global total of people affected by disasters and COVID-19 in 2020 were in the Asia-Pacific.

“Asia surpassed 10 million infections of the coronavirus previous month, as cases continue to mount. Behind only Latin America, Asia accounts for about one-fourth of the global caseload of 42.1 million of the virus. With over 163,000 deaths, the region accounts for some 14% of the global COVID-19 toll” (Source: News18 India, Asia becomes second region to exceed 10 million coronavirus cases, 24 Oct 2020, Reuters).

The Coronavirus pandemic has challenged healthcare systems across the world in a way not seen in modern times. Older people are disproportionately affected by the COVID-19 pandemic, which has had a profound impact on research as well as clinical service delivery. They are bearing the consequences of the pandemic as a group at the highest risk of hospitalization and death from COVID-19 illness. In this research paper, we’ll take a closer look at how exactly the old population and population densities of specific countries in Asia-Pacific region affect their mortality rate and are they even associated or not.

As proved by studies, the risk for severe illness with COVID-19 increases with age, with older adults at highest risk. Although all age groups are at risk of contracting COVID-19, older people face significant risk of developing severe illness if they contract the disease due to physiological changes that come with ageing and potential underlying health conditions.

For example, people in their 50s are at higher risk for severe illness than people in their 40s. Similarly, people in their 60s or 70s are, in general, at higher risk for severe illness than people in their 50s. The greatest risk for severe illness from COVID-19 is among those aged 85 or older.

The table below represents the age intervals which have higher risk of getting infected (hospitalization) and death.

(Source: Centers for Disease Control and Prevention CDC 24/7)

In our research, we’ll be considering only the population aged 65 and above i.e. old adults including male as well as female.


COVID-19 mortality is the number of deaths out of the total number of infected cases. Studies have shown that COVID-19 mortality can be explained by age, obesity, and underlying diseases, such as hypertension, diabetes, and coronary heart disease, as well as clinical symptoms, complications, hospital care, previous immunity and virus mutations.

Countries vary widely in terms of capacities to prevent, detect and respond to disease outbreaks. In this paper, I aim to explore these factors associated with COVID-19 mortalities at the country level, specifically in Asia-Pacific region.


VARIABLES that I’ll be using from the data set:


To begin with the analysis, we’ll first plot a basic scatter plot (cumulative cases vs cumulative deaths) showing the initial glimpse or pattern in this region or continent. I’m making use of the two variables here i.e. cases_cum (X-axis) and deaths_cum (Y-axis).

Since there is one outlier, so removing it will enable a more closer and accurate analysis. Hence, I’ll be filtering the data set again and storing in a new variable called ‘new_covid_asia’ to include only the countries that have a total number of cases less than 2,500,000.

# A tibble: 1 x 132
     X1 geoid2 date       month   day  year elapsed date_rep   cases
  <dbl> <chr>  <date>     <dbl> <dbl> <dbl>   <dbl> <date>     <dbl>
1    90 IND    2020-11-22    11    21  2020     326 2020-11-22 45209
# … with 123 more variables: deaths <dbl>, country <chr>,
#   population_2019 <dbl>, continent <chr>,
#   Cumulative_number_for_14_days_of_COVID.19_cases_per_100000 <dbl>,
#   cases_cum <dbl>, deaths_cum <dbl>, deaths_cum_log <dbl>,
#   deaths_cum_l7 <dbl>, deaths_cum_g7 <dbl>, region <chr>,
#   gov_effect <dbl>, trade <dbl>, ineq <dbl>, gdp_pc <dbl>,
#   pop_tot <dbl>, older_m <dbl>, older_f <dbl>, air_travel <dbl>,
#   fdi <dbl>, pop_density <dbl>, urban <dbl>, migration_share <dbl>,
#   oil <dbl>, soc_insur_cov <dbl>, soc_contrib <dbl>,
#   soc_safety <dbl>, pop_below14_2018 <dbl>, polity <dbl>,
#   gini <dbl>, elf_epr <dbl>, rq_polarization <dbl>,
#   count_powerless <dbl>, share_powerless <dbl>,
#   media_critical <dbl>, journal_harass <dbl>,
#   health_equality <dbl>, property_rights <dbl>,
#   transparent_law <dbl>, bureaucracy_corrupt <dbl>,
#   polar_rile <dbl>, trust_people <dbl>, trust_gov <dbl>,
#   electoral_pop <dbl>, federal_ind <dbl>, checks_veto <dbl>,
#   polariz_veto <dbl>, dist_senate <dbl>, dist_presid <dbl>,
#   dist_parlm <dbl>, dist_anyelec <dbl>, elect_pressure <dbl>,
#   pos_gov_lr <dbl>, woman_leader <dbl>, infections_mers <dbl>,
#   infections_sars <dbl>, infections_ebola <dbl>, infection <dbl>,
#   med_age_2013 <dbl>, vdem_libdem <dbl>, al_etfra <dbl>,
#   al_religfra <dbl>, fe_etfra <dbl>, vdem_mecorrpt <dbl>,
#   share_health_ins <dbl>, pandemic_prep <dbl>, pop_den_2018 <dbl>,
#   life_exp_2017 <dbl>, resp_disease_prev <dbl>, detect_index <dbl>,
#   doctors_pc <dbl>, hosp_beds_pc <dbl>, literacy_rate <dbl>,
#   healthcare_qual <dbl>, acc_sanitation <dbl>, health_exp_pc <dbl>,
#   hdi <dbl>, health_index <dbl>, respond_index <dbl>,
#   state_fragility <dbl>, pr <dbl>, share_older <dbl>,
#   pop_tot_log <dbl>, pop_density_log <dbl>, distancing_bin <lgl>,
#   lockdown_bin <lgl>, lockdown_n <lgl>, distancing_n <lgl>,
#   days_rel_lockdown <lgl>, days_rel_distancing <lgl>, retail <lgl>,
#   grocery <lgl>, parks <lgl>, transit <lgl>, work <lgl>,
#   residential <lgl>, mobility_index <dbl>, stringency <lgl>,
#   C1_School.closing <lgl>, C2_Workplace.closing <lgl>, …

As expected, the outlier is India, the second most populated and second most COVID-19 affected country in the world with over 9.7 million coronavirus cases as of December 10, 2020.

Now, we can plot the cumulative cases vs cumulative deaths again with the new data set (where the two outliers were removed) and interpret the plot better.

This scatter plot has an exponential curve. Hence, it has a positive correlation between the two variables i.e. cases_cum and deaths_cum. This means that as the number of positive COVID-19 cases rise, the death cases also rise, and the increase is exponential i.e. with small increase in cases, there is large increase in death count.


Similarly, I’ll be analyzing the relationship between COVID-19 mortality (in percentage) and cumulative cases.

All the countries have mortality rates below 10% except one country i.e. Yemen having a 29% mortality rate.

# A tibble: 1 x 8
  country pop_density cases_cum deaths_cum old_perc continent region
  <chr>         <dbl>     <dbl>      <dbl>    <dbl> <chr>     <chr> 
1 Yemen            56      2093        608     2.90 Asia      Middl…
# … with 1 more variable: income <chr>

Now, after removing Yemen from the plot:


Call:
lm(formula = (deaths_cum/cases_cum) * 100 ~ cases_cum, data = covid_2)

Coefficients:
(Intercept)    cases_cum  
  2.179e+00   -7.657e-08  

The scatter plot indicates that there is a slightly positive correlation between COVID-19 mortality and cumulative cases overall (including all countries without the outliers i.e. India and Yemen). To verify, the slope intercept shows a positive value +2.179.

A more closer analysis of different countries is done based on percentage of old population.

Now, we’ll divide the proportions of old aged population into

In order to see what kind of relationships exist between the two variables in these categorized countries, we’ll be:

Filtering countries that have old aged population % below 4.718 (LOW %).

Filtering countries that have old aged population % between 4.718 and 7.652 (MODERATE %).

Filtering countries that have old aged population % greater than 8 (HIGH %).

According to the plots, the negative correlation was significant for countries with moderate and low percentage of old population, respectively. I had tried to use logarithmic scale in these plots earlier but plots without using log seemed more clearer in pattern. The overall negative correlation in all three plots might be due to the fact that countries, with time adopted several preventive measures, including better healthcare facilities, professionals to lessen the impact of the virus on its people.

The survival rates seems to have improved, but rising case numbers are causing the total number of deaths to increase. The average age of people who developed COVID-19 and those visiting emergency rooms due to the disease dropped as more young people came down with the illness. Thus, there was an increase in younger people hospitalized with COVID-19.

Moreover, Many people at risk are also taking more steps to reduce the chances of being exposed to the virus. People who are older and have more underlying medical conditions are more consistently doing social distancing, frequent handwashing, and other measures to protect themselves.

As we compare the above three plots, we see that the countries that had comparatively lower population of old aged, showed a steeper negative slope (or comparatively more negative correlation than other two). It indicates that due to old population already being low, there was more decrease in mortality rate with rising cases.

While the countries that have moderate percentage of old population, showed not much decrease in mortality rate or less steeper negative slope than the previous. This might be because there is more old population proportion in these countries comparatively.

Based on surveying the countries in Asia-Pacific, the countries that have high percentage of old people, are advanced in terms of technology, government and have better healthcare system. Hence, we see a very little or no correlation in high % old population scatter plot.


Descriptive Analysis

   country           pop_density       cases_cum      
 Length:47          Min.   :   2.0   Min.   :      1  
 Class :character   1st Qu.:  32.5   1st Qu.:   1489  
 Mode  :character   Median :  93.0   Median :  69581  
                    Mean   : 398.0   Mean   : 319030  
                    3rd Qu.: 248.0   3rd Qu.: 163312  
                    Max.   :8358.0   Max.   :9095806  
   deaths_cum          old_perc       continent        
 Min.   :     0.0   Min.   : 1.157   Length:47         
 1st Qu.:    26.5   1st Qu.: 3.554   Class :character  
 Median :   603.0   Median : 5.180   Mode  :character  
 Mean   :  5567.3   Mean   : 6.593                     
 3rd Qu.:  2073.0   3rd Qu.: 7.413                     
 Max.   :133227.0   Max.   :28.002                     
    region             income         
 Length:47          Length:47         
 Class :character   Class :character  
 Mode  :character   Mode  :character  
                                      
                                      
                                      

The table above summarizes the mortality rate, cumulative cases, deaths and % old population of different countries in Asia-Pacific region.

Descriptive statistics

For the 51 countries, the mean COVID-19 mortality rate was 2.15%, the mean COVID-19 cumulative cases was 319030 and deaths was 5567. Moreover, the mean percentage of old population considering all countries in Asia-Pacific region was 6.59%.


I feel that the residents living in areas with high population density, such as big or metropolitan cities have a higher probability to come into close contact with others and consequently any contagious disease is expected to spread rapidly in dense areas. Now, I’ll analyze and conclude what kind of relationship exists between these.

Population density refers to the number of people living in an area per square kilometer.

I’ve used logarithmic scale here so that the points appear more spread out to enable better analysis. Surprisingly, the mortality rate seems to be decreasing with increasing population density. After looking at the countries, I found out that countries with higher densities have significantly lower virus-related death rates than do counties with lower densities, possibly due to superior healthcare systems. High-density cities and countries may offer more opportunities for crowding. But in Asia, proper public health precautions have spared many countries from the worst.

The plot shows that higher-density countries were actually associated with lower mortality rates, possibly because residents were more strictly following social-distancing guidelines or had better access to health care. Their superior health and educational systems could help mitigate the full impact of the disease for those who are infected, leading to higher rates of recovery and lower rates of mortality. Dense areas may be more likely to put in place policies that foster social distancing, thus reducing actual rates of infection or simply leading to greater social distancing due to greater public awareness of the threat. In addition, it is possible that denser environments make it easier for people to stay somewhat connected with neighbors, families, and friends while they are sheltering in place.

On the other hand, the lesser-density countries, even if they have less number of contacts but due to not having services to support patients, might result in higher mortality which is verified by the plot above.


Now I’ll be exploring our confounding variable i.e. migration_share. Reports of COVID-19 case explosions in migrant communities beg the question whether there is a correlation between mortality rates and the migrant population as a share of the total population. As is the case with America, the histogram shows us that most countries in Asia-Pacific have migrants as a share of the population of 10% or less.

This histogram plot shows that most of the countries have less than 10% mortality rate, with Yemen being an outlier having 29% mortality %.

# A tibble: 1 x 132
     X1 geoid2 date       month   day  year elapsed date_rep   cases
  <dbl> <chr>  <date>     <dbl> <dbl> <dbl>   <dbl> <date>     <dbl>
1   206 YEM    2020-11-22    11    21  2020     326 2020-11-22     3
# … with 123 more variables: deaths <dbl>, country <chr>,
#   population_2019 <dbl>, continent <chr>,
#   Cumulative_number_for_14_days_of_COVID.19_cases_per_100000 <dbl>,
#   cases_cum <dbl>, deaths_cum <dbl>, deaths_cum_log <dbl>,
#   deaths_cum_l7 <dbl>, deaths_cum_g7 <dbl>, region <chr>,
#   gov_effect <dbl>, trade <dbl>, ineq <dbl>, gdp_pc <dbl>,
#   pop_tot <dbl>, older_m <dbl>, older_f <dbl>, air_travel <dbl>,
#   fdi <dbl>, pop_density <dbl>, urban <dbl>, migration_share <dbl>,
#   oil <dbl>, soc_insur_cov <dbl>, soc_contrib <dbl>,
#   soc_safety <dbl>, pop_below14_2018 <dbl>, polity <dbl>,
#   gini <dbl>, elf_epr <dbl>, rq_polarization <dbl>,
#   count_powerless <dbl>, share_powerless <dbl>,
#   media_critical <dbl>, journal_harass <dbl>,
#   health_equality <dbl>, property_rights <dbl>,
#   transparent_law <dbl>, bureaucracy_corrupt <dbl>,
#   polar_rile <dbl>, trust_people <dbl>, trust_gov <dbl>,
#   electoral_pop <dbl>, federal_ind <dbl>, checks_veto <dbl>,
#   polariz_veto <dbl>, dist_senate <dbl>, dist_presid <dbl>,
#   dist_parlm <dbl>, dist_anyelec <dbl>, elect_pressure <dbl>,
#   pos_gov_lr <dbl>, woman_leader <dbl>, infections_mers <dbl>,
#   infections_sars <dbl>, infections_ebola <dbl>, infection <dbl>,
#   med_age_2013 <dbl>, vdem_libdem <dbl>, al_etfra <dbl>,
#   al_religfra <dbl>, fe_etfra <dbl>, vdem_mecorrpt <dbl>,
#   share_health_ins <dbl>, pandemic_prep <dbl>, pop_den_2018 <dbl>,
#   life_exp_2017 <dbl>, resp_disease_prev <dbl>, detect_index <dbl>,
#   doctors_pc <dbl>, hosp_beds_pc <dbl>, literacy_rate <dbl>,
#   healthcare_qual <dbl>, acc_sanitation <dbl>, health_exp_pc <dbl>,
#   hdi <dbl>, health_index <dbl>, respond_index <dbl>,
#   state_fragility <dbl>, pr <dbl>, share_older <dbl>,
#   pop_tot_log <dbl>, pop_density_log <dbl>, distancing_bin <lgl>,
#   lockdown_bin <lgl>, lockdown_n <lgl>, distancing_n <lgl>,
#   days_rel_lockdown <lgl>, days_rel_distancing <lgl>, retail <lgl>,
#   grocery <lgl>, parks <lgl>, transit <lgl>, work <lgl>,
#   residential <lgl>, mobility_index <dbl>, stringency <lgl>,
#   C1_School.closing <lgl>, C2_Workplace.closing <lgl>, …


Conclusion

During our individual analysis, we started to notice indications that our hypothesis might be on shaking grounds based on the results we were getting. Nevertheless, we decided to go further with our exploration of variables. After analyzing all the countries and continents with the data that was available in the data set, and by using various types of plots to show the relationships between the chosen variables, we conclude that the COVID-19 mortality rate does seem to be affected by the proportion of the population aged 65 or older, population densities and migrant share of population of countries. That was surprising to us because we were naturally inclined to create such a hypothesis, only to see it disproved by data. Thus, we couldn’t really reach a conclusion where we could say whether this impact was positively or negatively related because some countries (due to various other factors) showed positive correlation while others didn’t. Population densities, however, had a positive correlation in most countries, if not all.

The first problem we noticed started at the beginning of our research project, because we wanted to measure the effect of lockdown measures on the rate of mortality. The data for lockdown measures proved inconclusive, which pressured us to pivot our research question to the effect of the population aged 65 and older and their effect on COVID-19 mortality. But, the incompleteness of data did not stop with that one variable set called “lockdown measures”. Many countries in all of the continents for which data was gathered by the researchers did not have information about the variables we chose to support our analysis. There were no cumulative cases and deaths for countries in Europe, Africa, Asia, and the Americas which created difficulties while gathering the data and putting it into context, which ended with a so called “NA” result. The next problem we encountered was that some countries, which were at a crossroad between two, or even three continents (meaning that there are no distinct borders and those countries belong to those continents) were put as belonging in one continent and then another. This move by the researchers created additional results to our scatter and density plots which made it difficult for us to distinguish true data from essentially what was, “double data”. The third bigger problem we were faced with was that some countries were classified as regions while others were continents, for example Greenland was classified as “Europe & Central Asia” as a region, but “America” as a continent. Other problems were smaller than these ones, and we were able to resolve them very quickly.

Even though the results we got were not what we expected, we are far from discouraged. On the contrary, we are now more intrigued by how different variables react with one another and form a relationship. That is the beauty of data science - sometimes we do not get the results we are hoping for, but that gives us even more power to find the missing pieces of the puzzles and work with the results we have gotten. Who knows, we might even realize that a small error was disabling us from seeing what is right in front of our eyes.


References

Source: Ec.europa.eu. 2020. A Look At The Lives Of The Elderly In The EU Today.

https://ec.europa.eu/eurostat/cache/infographs/elderly/index.html#:~:text=In%202016%2C%2019.2%25%20of%20the,lowest%20in%20Ireland%20(13.2%25).

Source: (Migration and migrant population statistics - Statistics Explained, 2020)

https://ec.europa.eu/eurostat/statistics-explained/index.php/Migration_and_migrant_population_statistics#:~:text=2.4%20million%20immigrants%20entered%20the,non%2DEU%2D27%20citizens.

Source: En.wikipedia.org. 2020. Intra-African Migration.

https://en.wikipedia.org/wiki/Intra-African_migration

Source: (Covid-19 and Immunity in Aging Populations — A New Research Agenda | NEJM, 2020)

https://www.nejm.org/doi/full/10.1056/NEJMp2006761

Source: Centers for Disease Control and Prevention. 2020. COVID-19 And Your Health.

https://www.cdc.gov/coronavirus/2019-ncov/need-extra-precautions/older-adults.html

Source: Who.int. 2020. WHO Delivers Advice And Support For Older People During COVID-19.

https://www.who.int/news-room/feature-stories/detail/who-delivers-advice-and-support-for-older-people-during-covid-19#:~:text=Although%20all%20age%20groups%20are,potential%20underlying%20health%20conditions.

Source: Older Adults at greater risk of requiring hospitalization or dying if diagnosed with COVID-19

https://www.cdc.gov/coronavirus/2019-ncov/need-extra-precautions/older-adults.html#:~:text=Age%20Increases%20Risk%20for%20Severe%20Illness&text=The%20greatest%20risk%20for%20severe,intensive%20care%2C%20or%20a

Source: Liang, Li-Lin, et al. “Covid-19 mortality is negatively associated with test number and government effectiveness.” Scientific reports 10.1 (2020): 1-7.

https://www.nature.com/articles/s41598-020-68862-x

Source: Migration data relevant for the COVID-19 pandemic

https://migrationdataportal.org/themen/migration-data-relevant-covid-19-pandemic

Source: Johns Hopkins Coronavirus Resource Center. 2020. Mortality Analyses - Johns Hopkins Coronavirus Resource Center.

https://coronavirus.jhu.edu/data/mortality

Source: United Nations, S., 2020. United Nations World Data Forum.

https://unstats.un.org/unsd/undataforum/blog/Older-people-and-age-disaggregated-COVID-19-mortality/